55 research outputs found

    A Large-scale Film Style Dataset for Learning Multi-frequency Driven Film Enhancement

    Full text link
    Film, a classic image style, is culturally significant to the whole photographic industry since it marks the birth of photography. However, film photography is time-consuming and expensive, necessitating a more efficient method for collecting film-style photographs. Numerous datasets that have emerged in the field of image enhancement so far are not film-specific. In order to facilitate film-based image stylization research, we construct FilmSet, a large-scale and high-quality film style dataset. Our dataset includes three different film types and more than 5000 in-the-wild high resolution images. Inspired by the features of FilmSet images, we propose a novel framework called FilmNet based on Laplacian Pyramid for stylizing images across frequency bands and achieving film style outcomes. Experiments reveal that the performance of our model is superior than state-of-the-art techniques. Our dataset and code will be made publicly available

    High-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing Net

    Full text link
    Shadows often occur when we capture the documents with casual equipment, which influences the visual quality and readability of the digital copies. Different from the algorithms for natural shadow removal, the algorithms in document shadow removal need to preserve the details of fonts and figures in high-resolution input. Previous works ignore this problem and remove the shadows via approximate attention and small datasets, which might not work in real-world situations. We handle high-resolution document shadow removal directly via a larger-scale real-world dataset and a carefully designed frequency-aware network. As for the dataset, we acquire over 7k couples of high-resolution (2462 x 3699) images of real-world document pairs with various samples under different lighting circumstances, which is 10 times larger than existing datasets. As for the design of the network, we decouple the high-resolution images in the frequency domain, where the low-frequency details and high-frequency boundaries can be effectively learned via the carefully designed network structure. Powered by our network and dataset, the proposed method clearly shows a better performance than previous methods in terms of visual quality and numerical results. The code, models, and dataset are available at: https://github.com/CXH-Research/DocShadow-SD7KComment: Accepted by International Conference on Computer Vision 2023 (ICCV 2023

    ShaDocFormer: A Shadow-attentive Threshold Detector with Cascaded Fusion Refiner for document shadow removal

    Full text link
    Document shadow is a common issue that arise when capturing documents using mobile devices, which significantly impacts the readability. Current methods encounter various challenges including inaccurate detection of shadow masks and estimation of illumination. In this paper, we propose ShaDocFormer, a Transformer-based architecture that integrates traditional methodologies and deep learning techniques to tackle the problem of document shadow removal. The ShaDocFormer architecture comprises two components: the Shadow-attentive Threshold Detector (STD) and the Cascaded Fusion Refiner (CFR). The STD module employs a traditional thresholding technique and leverages the attention mechanism of the Transformer to gather global information, thereby enabling precise detection of shadow masks. The cascaded and aggregative structure of the CFR module facilitates a coarse-to-fine restoration process for the entire image. As a result, ShaDocFormer excels in accurately detecting and capturing variations in both shadow and illumination, thereby enabling effective removal of shadows. Extensive experiments demonstrate that ShaDocFormer outperforms current state-of-the-art methods in both qualitative and quantitative measurements

    UWFormer: Underwater Image Enhancement via a Semi-Supervised Multi-Scale Transformer

    Full text link
    Underwater images often exhibit poor quality, imbalanced coloration, and low contrast due to the complex and intricate interaction of light, water, and objects. Despite the significant contributions of previous underwater enhancement techniques, there exist several problems that demand further improvement: (i) Current deep learning methodologies depend on Convolutional Neural Networks (CNNs) that lack multi-scale enhancement and also have limited global perception fields. (ii) The scarcity of paired real-world underwater datasets poses a considerable challenge, and the utilization of synthetic image pairs risks overfitting. To address the aforementioned issues, this paper presents a Multi-scale Transformer-based Network called UWFormer for enhancing images at multiple frequencies via semi-supervised learning, in which we propose a Nonlinear Frequency-aware Attention mechanism and a Multi-Scale Fusion Feed-forward Network for low-frequency enhancement. Additionally, we introduce a specialized underwater semi-supervised training strategy, proposing a Subaqueous Perceptual Loss function to generate reliable pseudo labels. Experiments using full-reference and non-reference underwater benchmarks demonstrate that our method outperforms state-of-the-art methods in terms of both quantity and visual quality

    DocDeshadower: Frequency-aware Transformer for Document Shadow Removal

    Full text link
    The presence of shadows significantly impacts the visual quality of scanned documents. However, the existing traditional techniques and deep learning methods used for shadow removal have several limitations. These methods either rely heavily on heuristics, resulting in suboptimal performance, or require large datasets to learn shadow-related features. In this study, we propose the DocDeshadower, a multi-frequency Transformer-based model built on Laplacian Pyramid. DocDeshadower is designed to remove shadows at different frequencies in a coarse-to-fine manner. To achieve this, we decompose the shadow image into different frequency bands using Laplacian Pyramid. In addition, we introduce two novel components to this model: the Attention-Aggregation Network and the Gated Multi-scale Fusion Transformer. The Attention-Aggregation Network is designed to remove shadows in the low-frequency part of the image, whereas the Gated Multi-scale Fusion Transformer refines the entire image at a global scale with its large perceptive field. Our extensive experiments demonstrate that DocDeshadower outperforms the current state-of-the-art methods in both qualitative and quantitative terms

    DiffGAN-F2S: Symmetric and Efficient Denoising Diffusion GANs for Structural Connectivity Prediction from Brain fMRI

    Full text link
    Mapping from functional connectivity (FC) to structural connectivity (SC) can facilitate multimodal brain network fusion and discover potential biomarkers for clinical implications. However, it is challenging to directly bridge the reliable non-linear mapping relations between SC and functional magnetic resonance imaging (fMRI). In this paper, a novel diffusision generative adversarial network-based fMRI-to-SC (DiffGAN-F2S) model is proposed to predict SC from brain fMRI in an end-to-end manner. To be specific, the proposed DiffGAN-F2S leverages denoising diffusion probabilistic models (DDPMs) and adversarial learning to efficiently generate high-fidelity SC through a few steps from fMRI. By designing the dual-channel multi-head spatial attention (DMSA) and graph convolutional modules, the symmetric graph generator first captures global relations among direct and indirect connected brain regions, then models the local brain region interactions. It can uncover the complex mapping relations between fMRI and structural connectivity. Furthermore, the spatially connected consistency loss is devised to constrain the generator to preserve global-local topological information for accurate intrinsic SC prediction. Testing on the public Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, the proposed model can effectively generate empirical SC-preserved connectivity from four-dimensional imaging data and shows superior performance in SC prediction compared with other related models. Furthermore, the proposed model can identify the vast majority of important brain regions and connections derived from the empirical method, providing an alternative way to fuse multimodal brain networks and analyze clinical disease.Comment: 12 page

    Forecasting the COVID-19 transmission in Italy based on the minimum spanning tree of dynamic region network

    Get PDF
    Background Italy surpassed 1.5 million confirmed Coronavirus Disease 2019 (COVID-19) infections on November 26, as its death toll rose rapidly in the second wave of COVID-19 outbreak which is a heavy burden on hospitals. Therefore, it is necessary to forecast and early warn the potential outbreak of COVID-19 in the future, which facilitates the timely implementation of appropriate control measures. However, real-time prediction of COVID-19 transmission and outbreaks is usually challenging because of its complexity intertwining both biological systems and social systems. Methods By mining the dynamical information from region networks and the short-term time series data, we developed a data-driven model, the minimum-spanning-tree-based dynamical network marker (MST-DNM), to quantitatively analyze and monitor the dynamical process of COVID-19 spreading. Specifically, we collected the historical information of daily cases caused by COVID-19 infection in Italy from February 24, 2020 to November 28, 2020. When applied to the region network of Italy, the MST-DNM model has the ability to monitor the whole process of COVID-19 transmission and successfully identify the early-warning signals. The interpretability and practical significance of our model are explained in detail in this study. Results The study on the dynamical changes of Italian region networks reveals the dynamic of COVID-19 transmission at the network level. It is noteworthy that the driving force of MST-DNM only relies on small samples rather than years of time series data. Therefore, it is of great potential in public surveillance for emerging infectious diseases

    Generative AI for brain image computing and brain network computing: a review

    Get PDF
    Recent years have witnessed a significant advancement in brain imaging techniques that offer a non-invasive approach to mapping the structure and function of the brain. Concurrently, generative artificial intelligence (AI) has experienced substantial growth, involving using existing data to create new content with a similar underlying pattern to real-world data. The integration of these two domains, generative AI in neuroimaging, presents a promising avenue for exploring various fields of brain imaging and brain network computing, particularly in the areas of extracting spatiotemporal brain features and reconstructing the topological connectivity of brain networks. Therefore, this study reviewed the advanced models, tasks, challenges, and prospects of brain imaging and brain network computing techniques and intends to provide a comprehensive picture of current generative AI techniques in brain imaging. This review is focused on novel methodological approaches and applications of related new methods. It discussed fundamental theories and algorithms of four classic generative models and provided a systematic survey and categorization of tasks, including co-registration, super-resolution, enhancement, classification, segmentation, cross-modality, brain network analysis, and brain decoding. This paper also highlighted the challenges and future directions of the latest work with the expectation that future research can be beneficial

    Exploring Indoor White Spaces in Metropolises

    No full text
    It is a promising vision to utilize white spaces, i.e., vacant VHF and UHF TV channels, to satisfy skyrocketing wireless data demand in both outdoor and indoor scenarios. While most prior works have focused on exploring outdoor white spaces, the indoor story is largely open for investigation. Motivated by this observation and that 70 % of the spectrum demand comes from indoor environments, we carry out a comprehensive study of exploring indoor white spaces. We first present a large-scale measurement of outdoor and indoor TV spectrum occupancy in 30+ diverse locations in a typical metropolis Hong Kong. Our measurement results confirm abundant white spaces available for exploration in a wide range of areas in metropolises. In particular, more than 50 % and 70 % of the TV spectrum are white spaces in outdoor and indoor scenarios, respectively. While there are substantially more white space
    corecore